Overview

Dataset statistics

Number of variables18
Number of observations60000
Missing cells0
Missing cells (%)0.0%
Duplicate rows7
Duplicate rows (%)< 0.1%
Total size in memory8.2 MiB
Average record size in memory144.0 B

Variable types

Categorical6
Numeric11
Boolean1

Alerts

Dataset has 7 (< 0.1%) duplicate rowsDuplicates
trip_distance is highly overall correlated with fare_amount and 2 other fieldsHigh correlation
ratecodeid is highly overall correlated with tolls_amountHigh correlation
fare_amount is highly overall correlated with trip_distance and 2 other fieldsHigh correlation
extra is highly overall correlated with vendoridHigh correlation
tip_amount is highly overall correlated with total_amountHigh correlation
tolls_amount is highly overall correlated with ratecodeidHigh correlation
total_amount is highly overall correlated with trip_distance and 3 other fieldsHigh correlation
duration is highly overall correlated with trip_distance and 2 other fieldsHigh correlation
vendorid is highly overall correlated with extraHigh correlation
mta_tax is highly overall correlated with improvement_surcharge and 1 other fieldsHigh correlation
improvement_surcharge is highly overall correlated with mta_tax and 1 other fieldsHigh correlation
congestion_surcharge is highly overall correlated with mta_tax and 1 other fieldsHigh correlation
store_and_fwd_flag is highly imbalanced (95.3%)Imbalance
payment_type is highly imbalanced (52.6%)Imbalance
mta_tax is highly imbalanced (89.0%)Imbalance
improvement_surcharge is highly imbalanced (95.1%)Imbalance
congestion_surcharge is highly imbalanced (65.2%)Imbalance
airport_fee is highly imbalanced (70.7%)Imbalance
duration is highly skewed (γ1 = 23.81338668)Skewed
passenger_count has 1143 (1.9%) zerosZeros
trip_distance has 1121 (1.9%) zerosZeros
extra has 27569 (45.9%) zerosZeros
tip_amount has 18247 (30.4%) zerosZeros
tolls_amount has 54413 (90.7%) zerosZeros

Reproduction

Analysis started2023-10-29 14:17:36.981957
Analysis finished2023-10-29 14:19:14.251201
Duration1 minute and 37.27 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

vendorid
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.9 KiB
2
44804 
1
15196 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters60000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2 44804
74.7%
1 15196
 
25.3%

Length

2023-10-29T15:19:14.695876image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-29T15:19:15.336527image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
2 44804
74.7%
1 15196
 
25.3%

Most occurring characters

ValueCountFrequency (%)
2 44804
74.7%
1 15196
 
25.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 60000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 44804
74.7%
1 15196
 
25.3%

Most occurring scripts

ValueCountFrequency (%)
Common 60000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 44804
74.7%
1 15196
 
25.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 44804
74.7%
1 15196
 
25.3%

passenger_count
Real number (ℝ)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5574167
Minimum0
Maximum6
Zeros1143
Zeros (%)1.9%
Negative0
Negative (%)0.0%
Memory size468.9 KiB
2023-10-29T15:19:15.846793image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.0276126
Coefficient of variation (CV)0.65981865
Kurtosis4.2443528
Mean1.5574167
Median Absolute Deviation (MAD)0
Skewness1.9987009
Sum93445
Variance1.0559876
MonotonicityNot monotonic
2023-10-29T15:19:16.362272image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 38370
63.9%
2 12658
 
21.1%
3 3746
 
6.2%
4 2467
 
4.1%
0 1143
 
1.9%
5 1043
 
1.7%
6 573
 
1.0%
ValueCountFrequency (%)
0 1143
 
1.9%
1 38370
63.9%
2 12658
 
21.1%
3 3746
 
6.2%
4 2467
 
4.1%
5 1043
 
1.7%
6 573
 
1.0%
ValueCountFrequency (%)
6 573
 
1.0%
5 1043
 
1.7%
4 2467
 
4.1%
3 3746
 
6.2%
2 12658
 
21.1%
1 38370
63.9%
0 1143
 
1.9%

trip_distance
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct2405
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1211262
Minimum0
Maximum105.55
Zeros1121
Zeros (%)1.9%
Negative0
Negative (%)0.0%
Memory size468.9 KiB
2023-10-29T15:19:16.996983image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.5
Q11.2
median2.2
Q34.6125
95-th percentile17.3
Maximum105.55
Range105.55
Interquartile range (IQR)3.4125

Descriptive statistics

Standard deviation4.9869271
Coefficient of variation (CV)1.2100884
Kurtosis11.613484
Mean4.1211262
Median Absolute Deviation (MAD)1.26
Skewness2.6013305
Sum247267.57
Variance24.869442
MonotonicityNot monotonic
2023-10-29T15:19:17.654474image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1121
 
1.9%
1 659
 
1.1%
1.2 641
 
1.1%
0.9 639
 
1.1%
1.1 619
 
1.0%
1.3 615
 
1.0%
0.8 605
 
1.0%
1.4 601
 
1.0%
1.5 539
 
0.9%
1.7 538
 
0.9%
Other values (2395) 53423
89.0%
ValueCountFrequency (%)
0 1121
1.9%
0.01 64
 
0.1%
0.02 43
 
0.1%
0.03 36
 
0.1%
0.04 20
 
< 0.1%
0.05 19
 
< 0.1%
0.06 26
 
< 0.1%
0.07 22
 
< 0.1%
0.08 21
 
< 0.1%
0.09 15
 
< 0.1%
ValueCountFrequency (%)
105.55 1
< 0.1%
82.07 2
< 0.1%
66.42 1
< 0.1%
57.49 1
< 0.1%
56.41 1
< 0.1%
56.2 1
< 0.1%
54.7 1
< 0.1%
53.07 1
< 0.1%
51.4 1
< 0.1%
49.79 1
< 0.1%

ratecodeid
Real number (ℝ)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4230667
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.9 KiB
2023-10-29T15:19:18.160734image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5.3955105
Coefficient of variation (CV)3.7914671
Kurtosis319.73057
Mean1.4230667
Median Absolute Deviation (MAD)0
Skewness17.848017
Sum85384
Variance29.111533
MonotonicityNot monotonic
2023-10-29T15:19:18.515813image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 55324
92.2%
2 3095
 
5.2%
5 811
 
1.4%
3 462
 
0.8%
99 181
 
0.3%
4 126
 
0.2%
6 1
 
< 0.1%
ValueCountFrequency (%)
1 55324
92.2%
2 3095
 
5.2%
3 462
 
0.8%
4 126
 
0.2%
5 811
 
1.4%
6 1
 
< 0.1%
99 181
 
0.3%
ValueCountFrequency (%)
99 181
 
0.3%
6 1
 
< 0.1%
5 811
 
1.4%
4 126
 
0.2%
3 462
 
0.8%
2 3095
 
5.2%
1 55324
92.2%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.7 KiB
False
59688 
True
 
312
ValueCountFrequency (%)
False 59688
99.5%
True 312
 
0.5%
2023-10-29T15:19:18.911221image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

pulocationid
Real number (ℝ)

Distinct212
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean159.52992
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.9 KiB
2023-10-29T15:19:19.436945image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1113
median161
Q3231
95-th percentile262
Maximum265
Range264
Interquartile range (IQR)118

Descriptive statistics

Standard deviation67.24043
Coefficient of variation (CV)0.42149104
Kurtosis-0.9584374
Mean159.52992
Median Absolute Deviation (MAD)68
Skewness-0.14742022
Sum9571795
Variance4521.2754
MonotonicityNot monotonic
2023-10-29T15:19:20.068095image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
132 3669
 
6.1%
48 2309
 
3.8%
79 2143
 
3.6%
237 1942
 
3.2%
161 1927
 
3.2%
186 1826
 
3.0%
170 1781
 
3.0%
142 1776
 
3.0%
162 1770
 
2.9%
239 1756
 
2.9%
Other values (202) 39101
65.2%
ValueCountFrequency (%)
1 35
 
0.1%
4 140
 
0.2%
5 2
 
< 0.1%
6 1
 
< 0.1%
7 86
 
0.1%
8 1
 
< 0.1%
10 33
 
0.1%
11 1
 
< 0.1%
12 45
 
0.1%
13 356
0.6%
ValueCountFrequency (%)
265 106
 
0.2%
264 900
1.5%
263 1582
2.6%
262 708
1.2%
261 444
 
0.7%
260 35
 
0.1%
259 1
 
< 0.1%
258 5
 
< 0.1%
257 3
 
< 0.1%
256 78
 
0.1%

dolocationid
Real number (ℝ)

Distinct249
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.78078
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size468.9 KiB
2023-10-29T15:19:20.676945image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile37
Q1100
median161
Q3233
95-th percentile262
Maximum265
Range264
Interquartile range (IQR)133

Descriptive statistics

Standard deviation74.06101
Coefficient of variation (CV)0.47238576
Kurtosis-1.0492642
Mean156.78078
Median Absolute Deviation (MAD)71
Skewness-0.25046009
Sum9406847
Variance5485.0332
MonotonicityNot monotonic
2023-10-29T15:19:21.296664image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
48 1826
 
3.0%
236 1803
 
3.0%
141 1765
 
2.9%
170 1601
 
2.7%
239 1590
 
2.6%
263 1580
 
2.6%
79 1524
 
2.5%
161 1521
 
2.5%
142 1460
 
2.4%
237 1448
 
2.4%
Other values (239) 43882
73.1%
ValueCountFrequency (%)
1 418
0.7%
2 3
 
< 0.1%
3 3
 
< 0.1%
4 300
0.5%
5 2
 
< 0.1%
6 3
 
< 0.1%
7 430
0.7%
8 1
 
< 0.1%
9 6
 
< 0.1%
10 85
 
0.1%
ValueCountFrequency (%)
265 488
 
0.8%
264 545
 
0.9%
263 1580
2.6%
262 1068
1.8%
261 392
 
0.7%
260 96
 
0.2%
259 16
 
< 0.1%
258 19
 
< 0.1%
257 39
 
0.1%
256 173
 
0.3%

payment_type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.9 KiB
1
44549 
2
14175 
4
 
832
3
 
444

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters60000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 44549
74.2%
2 14175
 
23.6%
4 832
 
1.4%
3 444
 
0.7%

Length

2023-10-29T15:19:21.717698image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-29T15:19:22.125821image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 44549
74.2%
2 14175
 
23.6%
4 832
 
1.4%
3 444
 
0.7%

Most occurring characters

ValueCountFrequency (%)
1 44549
74.2%
2 14175
 
23.6%
4 832
 
1.4%
3 444
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 60000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 44549
74.2%
2 14175
 
23.6%
4 832
 
1.4%
3 444
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common 60000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 44549
74.2%
2 14175
 
23.6%
4 832
 
1.4%
3 444
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 44549
74.2%
2 14175
 
23.6%
4 832
 
1.4%
3 444
 
0.7%

fare_amount
Real number (ℝ)

Distinct658
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21.174815
Minimum-346
Maximum454.5
Zeros9
Zeros (%)< 0.1%
Negative608
Negative (%)1.0%
Memory size468.9 KiB
2023-10-29T15:19:22.584538image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-346
5-th percentile5.8
Q19.3
median14.2
Q325.4
95-th percentile70
Maximum454.5
Range800.5
Interquartile range (IQR)16.1

Descriptive statistics

Standard deviation20.738464
Coefficient of variation (CV)0.97939294
Kurtosis19.023514
Mean21.174815
Median Absolute Deviation (MAD)6.3
Skewness2.3297621
Sum1270488.9
Variance430.0839
MonotonicityNot monotonic
2023-10-29T15:19:23.193233image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
70 3080
 
5.1%
8.6 2536
 
4.2%
7.9 2470
 
4.1%
10 2419
 
4.0%
7.2 2394
 
4.0%
9.3 2363
 
3.9%
11.4 2200
 
3.7%
10.7 2192
 
3.7%
6.5 2190
 
3.6%
12.1 1940
 
3.2%
Other values (648) 36216
60.4%
ValueCountFrequency (%)
-346 1
< 0.1%
-270.32 1
< 0.1%
-180 1
< 0.1%
-161.2 1
< 0.1%
-130 1
< 0.1%
-105.9 1
< 0.1%
-100.7 1
< 0.1%
-100 2
< 0.1%
-94 1
< 0.1%
-90 1
< 0.1%
ValueCountFrequency (%)
454.5 1
< 0.1%
400 1
< 0.1%
346 1
< 0.1%
326.4 1
< 0.1%
296.3 1
< 0.1%
295.6 1
< 0.1%
272.5 1
< 0.1%
270.32 1
< 0.1%
267.6 1
< 0.1%
258 2
< 0.1%

extra
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.148344
Minimum-6
Maximum9.75
Zeros27569
Zeros (%)45.9%
Negative233
Negative (%)0.4%
Memory size468.9 KiB
2023-10-29T15:19:23.769380image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-6
5-th percentile0
Q10
median1
Q32.5
95-th percentile3.5
Maximum9.75
Range15.75
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation1.5048418
Coefficient of variation (CV)1.3104451
Kurtosis2.9345206
Mean1.148344
Median Absolute Deviation (MAD)1
Skewness1.5589281
Sum68900.64
Variance2.2645488
MonotonicityNot monotonic
2023-10-29T15:19:24.302916image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
0 27569
45.9%
1 15925
26.5%
2.5 8190
 
13.7%
3.5 4713
 
7.9%
5 1948
 
3.2%
3.75 319
 
0.5%
1.25 301
 
0.5%
-1 217
 
0.4%
8.75 208
 
0.3%
6.25 191
 
0.3%
Other values (14) 419
 
0.7%
ValueCountFrequency (%)
-6 2
 
< 0.1%
-5 11
 
< 0.1%
-2.5 1
 
< 0.1%
-1 217
 
0.4%
-0.5 2
 
< 0.1%
0 27569
45.9%
0.5 55
 
0.1%
1 15925
26.5%
1.14 1
 
< 0.1%
1.25 301
 
0.5%
ValueCountFrequency (%)
9.75 3
 
< 0.1%
8.75 208
 
0.3%
8.5 8
 
< 0.1%
7.5 127
 
0.2%
7.25 13
 
< 0.1%
6.25 191
 
0.3%
6 87
 
0.1%
5 1948
3.2%
4.75 1
 
< 0.1%
3.75 319
 
0.5%

mta_tax
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.9 KiB
0.5
58233 
0.0
 
1179
-0.5
 
574
0.8
 
14

Length

Max length4
Median length3
Mean length3.0095667
Min length3

Characters and Unicode

Total characters180574
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.5
2nd row0.5
3rd row0.5
4th row0.5
5th row0.5

Common Values

ValueCountFrequency (%)
0.5 58233
97.1%
0.0 1179
 
2.0%
-0.5 574
 
1.0%
0.8 14
 
< 0.1%

Length

2023-10-29T15:19:24.811803image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-29T15:19:25.226087image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.5 58807
98.0%
0.0 1179
 
2.0%
0.8 14
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 61179
33.9%
. 60000
33.2%
5 58807
32.6%
- 574
 
0.3%
8 14
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 120000
66.5%
Other Punctuation 60000
33.2%
Dash Punctuation 574
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 61179
51.0%
5 58807
49.0%
8 14
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 60000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 574
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 180574
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 61179
33.9%
. 60000
33.2%
5 58807
32.6%
- 574
 
0.3%
8 14
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 180574
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 61179
33.9%
. 60000
33.2%
5 58807
32.6%
- 574
 
0.3%
8 14
 
< 0.1%

tip_amount
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1732
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4642077
Minimum-0.9
Maximum211.5
Zeros18247
Zeros (%)30.4%
Negative6
Negative (%)< 0.1%
Memory size468.9 KiB
2023-10-29T15:19:25.621029image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-0.9
5-th percentile0
Q10
median2.52
Q34.5125
95-th percentile12.45
Maximum211.5
Range212.4
Interquartile range (IQR)4.5125

Descriptive statistics

Standard deviation4.5117117
Coefficient of variation (CV)1.3023791
Kurtosis107.34285
Mean3.4642077
Median Absolute Deviation (MAD)2.52
Skewness5.0946398
Sum207852.46
Variance20.355542
MonotonicityNot monotonic
2023-10-29T15:19:26.081095image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 18247
30.4%
2 2462
 
4.1%
1 2104
 
3.5%
3 1335
 
2.2%
5 1087
 
1.8%
2.8 715
 
1.2%
2.1 578
 
1.0%
3.5 572
 
1.0%
2.52 563
 
0.9%
10 548
 
0.9%
Other values (1722) 31789
53.0%
ValueCountFrequency (%)
-0.9 1
 
< 0.1%
-0.01 5
 
< 0.1%
0 18247
30.4%
0.01 82
 
0.1%
0.02 16
 
< 0.1%
0.03 7
 
< 0.1%
0.04 2
 
< 0.1%
0.05 18
 
< 0.1%
0.06 4
 
< 0.1%
0.07 4
 
< 0.1%
ValueCountFrequency (%)
211.5 1
< 0.1%
104 1
< 0.1%
99.99 1
< 0.1%
90 2
< 0.1%
85 1
< 0.1%
81 1
< 0.1%
80.55 1
< 0.1%
80 1
< 0.1%
77 1
< 0.1%
75 1
< 0.1%

tolls_amount
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct132
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.69923917
Minimum-20.75
Maximum49.85
Zeros54413
Zeros (%)90.7%
Negative42
Negative (%)0.1%
Memory size468.9 KiB
2023-10-29T15:19:26.536891image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-20.75
5-th percentile0
Q10
median0
Q30
95-th percentile6.55
Maximum49.85
Range70.6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.4697361
Coefficient of variation (CV)3.5320334
Kurtosis30.804007
Mean0.69923917
Median Absolute Deviation (MAD)0
Skewness4.5366338
Sum41954.35
Variance6.0995962
MonotonicityNot monotonic
2023-10-29T15:19:27.415306image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 54413
90.7%
6.55 4792
 
8.0%
13.75 163
 
0.3%
11.75 160
 
0.3%
3 42
 
0.1%
-6.55 31
 
0.1%
19.75 25
 
< 0.1%
16.75 23
 
< 0.1%
13.1 21
 
< 0.1%
11.55 19
 
< 0.1%
Other values (122) 311
 
0.5%
ValueCountFrequency (%)
-20.75 1
 
< 0.1%
-19.75 1
 
< 0.1%
-15.75 1
 
< 0.1%
-13.75 4
 
< 0.1%
-11.75 3
 
< 0.1%
-8.36 1
 
< 0.1%
-6.55 31
 
0.1%
0 54413
90.7%
1 2
 
< 0.1%
2 1
 
< 0.1%
ValueCountFrequency (%)
49.85 1
< 0.1%
39.75 1
< 0.1%
36.25 1
< 0.1%
34.05 2
< 0.1%
34 1
< 0.1%
33.8 1
< 0.1%
33.7 1
< 0.1%
32.1 1
< 0.1%
30 1
< 0.1%
29.65 1
< 0.1%

improvement_surcharge
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.9 KiB
1.0
59210 
-1.0
 
606
0.3
 
171
0.0
 
11
-0.3
 
2

Length

Max length4
Median length3
Mean length3.0101333
Min length3

Characters and Unicode

Total characters180608
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 59210
98.7%
-1.0 606
 
1.0%
0.3 171
 
0.3%
0.0 11
 
< 0.1%
-0.3 2
 
< 0.1%

Length

2023-10-29T15:19:27.858729image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-29T15:19:28.285871image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 59816
99.7%
0.3 173
 
0.3%
0.0 11
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 60011
33.2%
. 60000
33.2%
1 59816
33.1%
- 608
 
0.3%
3 173
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 120000
66.4%
Other Punctuation 60000
33.2%
Dash Punctuation 608
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 60011
50.0%
1 59816
49.8%
3 173
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 60000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 608
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 180608
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 60011
33.2%
. 60000
33.2%
1 59816
33.1%
- 608
 
0.3%
3 173
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 180608
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 60011
33.2%
. 60000
33.2%
1 59816
33.1%
- 608
 
0.3%
3 173
 
0.1%

total_amount
Real number (ℝ)

Distinct4296
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.667955
Minimum-351
Maximum472.25
Zeros2
Zeros (%)< 0.1%
Negative608
Negative (%)1.0%
Memory size468.9 KiB
2023-10-29T15:19:28.680194image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-351
5-th percentile10.5
Q115.25
median21
Q333.4
95-th percentile88.8
Maximum472.25
Range823.25
Interquartile range (IQR)18.15

Descriptive statistics

Standard deviation25.351152
Coefficient of variation (CV)0.8544961
Kurtosis12.939333
Mean29.667955
Median Absolute Deviation (MAD)7.2
Skewness2.181676
Sum1780077.3
Variance642.6809
MonotonicityNot monotonic
2023-10-29T15:19:29.118254image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.6 940
 
1.6%
16.8 907
 
1.5%
21 754
 
1.3%
14 554
 
0.9%
15.12 496
 
0.8%
15.96 487
 
0.8%
18.48 449
 
0.7%
13.3 449
 
0.7%
14.28 440
 
0.7%
11.2 437
 
0.7%
Other values (4286) 54087
90.1%
ValueCountFrequency (%)
-351 1
< 0.1%
-287.57 1
< 0.1%
-166.2 1
< 0.1%
-146.8 1
< 0.1%
-145.25 1
< 0.1%
-122.45 1
< 0.1%
-111.75 1
< 0.1%
-111.15 1
< 0.1%
-105.35 1
< 0.1%
-101 2
< 0.1%
ValueCountFrequency (%)
472.25 1
< 0.1%
401 1
< 0.1%
359.22 1
< 0.1%
351 1
< 0.1%
349.7 1
< 0.1%
338.16 1
< 0.1%
333.05 1
< 0.1%
300 1
< 0.1%
297.65 1
< 0.1%
297.07 1
< 0.1%

congestion_surcharge
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.9 KiB
2.5
53185 
0.0
6341 
-2.5
 
474

Length

Max length4
Median length3
Mean length3.0079
Min length3

Characters and Unicode

Total characters180474
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.5
2nd row2.5
3rd row2.5
4th row0.0
5th row2.5

Common Values

ValueCountFrequency (%)
2.5 53185
88.6%
0.0 6341
 
10.6%
-2.5 474
 
0.8%

Length

2023-10-29T15:19:29.688331image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-29T15:19:30.280863image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
2.5 53659
89.4%
0.0 6341
 
10.6%

Most occurring characters

ValueCountFrequency (%)
. 60000
33.2%
2 53659
29.7%
5 53659
29.7%
0 12682
 
7.0%
- 474
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 120000
66.5%
Other Punctuation 60000
33.2%
Dash Punctuation 474
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 53659
44.7%
5 53659
44.7%
0 12682
 
10.6%
Other Punctuation
ValueCountFrequency (%)
. 60000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 474
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 180474
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 60000
33.2%
2 53659
29.7%
5 53659
29.7%
0 12682
 
7.0%
- 474
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 180474
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 60000
33.2%
2 53659
29.7%
5 53659
29.7%
0 12682
 
7.0%
- 474
 
0.3%

airport_fee
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size468.9 KiB
0.0
54264 
1.25
5660 
-1.25
 
76

Length

Max length5
Median length3
Mean length3.0968667
Min length3

Characters and Unicode

Total characters185812
Distinct characters6
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.25
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 54264
90.4%
1.25 5660
 
9.4%
-1.25 76
 
0.1%

Length

2023-10-29T15:19:30.852927image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-29T15:19:31.516922image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 54264
90.4%
1.25 5736
 
9.6%

Most occurring characters

ValueCountFrequency (%)
0 108528
58.4%
. 60000
32.3%
1 5736
 
3.1%
2 5736
 
3.1%
5 5736
 
3.1%
- 76
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 125736
67.7%
Other Punctuation 60000
32.3%
Dash Punctuation 76
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 108528
86.3%
1 5736
 
4.6%
2 5736
 
4.6%
5 5736
 
4.6%
Other Punctuation
ValueCountFrequency (%)
. 60000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 76
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 185812
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 108528
58.4%
. 60000
32.3%
1 5736
 
3.1%
2 5736
 
3.1%
5 5736
 
3.1%
- 76
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 185812
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 108528
58.4%
. 60000
32.3%
1 5736
 
3.1%
2 5736
 
3.1%
5 5736
 
3.1%
- 76
 
< 0.1%

duration
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct3574
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.260832
Minimum0
Maximum2596.6333
Zeros24
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size468.9 KiB
2023-10-29T15:19:32.179094image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.8333333
Q16.9
median11.9
Q319.7
95-th percentile36.85
Maximum2596.6333
Range2596.6333
Interquartile range (IQR)12.8

Descriptive statistics

Standard deviation60.277089
Coefficient of variation (CV)3.4921311
Kurtosis621.9327
Mean17.260832
Median Absolute Deviation (MAD)5.8666667
Skewness23.813387
Sum1035649.9
Variance3633.3274
MonotonicityNot monotonic
2023-10-29T15:19:32.963242image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.1 86
 
0.1%
6.35 73
 
0.1%
7.3 73
 
0.1%
6.166666667 73
 
0.1%
5.966666667 72
 
0.1%
5.7 72
 
0.1%
4.533333333 71
 
0.1%
8.933333333 70
 
0.1%
0.06666666667 70
 
0.1%
6.483333333 68
 
0.1%
Other values (3564) 59272
98.8%
ValueCountFrequency (%)
0 24
 
< 0.1%
0.01666666667 3
 
< 0.1%
0.03333333333 27
 
< 0.1%
0.05 42
0.1%
0.06666666667 70
0.1%
0.08333333333 63
0.1%
0.1 86
0.1%
0.1166666667 44
0.1%
0.1333333333 63
0.1%
0.15 41
0.1%
ValueCountFrequency (%)
2596.633333 2
< 0.1%
2596.2 1
< 0.1%
1439.016667 1
< 0.1%
1438.783333 1
< 0.1%
1438.6 1
< 0.1%
1438.45 1
< 0.1%
1438.383333 1
< 0.1%
1438.3 1
< 0.1%
1438.283333 1
< 0.1%
1438.25 1
< 0.1%

Interactions

2023-10-29T15:19:07.000571image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:09.237497image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:16.175642image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:23.182245image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:29.270779image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:36.130211image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:42.323286image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:46.907473image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:51.861437image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:56.867559image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:02.060072image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:07.369536image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:10.239012image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:16.656423image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:23.887465image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:30.367586image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:36.755569image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:42.768983image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:47.302539image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:52.232738image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:57.345264image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:02.422368image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:07.820732image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:10.818017image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:17.239784image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:24.533758image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:31.087323image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:37.340959image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:43.211555image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:47.683047image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:52.589207image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:57.812258image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:02.771587image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:08.182782image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:11.389646image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:17.983704image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:25.044066image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:31.663024image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:37.859815image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:43.617968image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:48.064234image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:53.032806image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:58.283623image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:03.701668image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:08.545904image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:12.087388image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:18.652892image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:25.499799image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:32.145018image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:38.500739image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:44.027479image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:48.448946image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:53.533703image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:58.750549image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:04.142949image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:08.907471image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:12.804618image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:19.363847image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:25.994985image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:32.572733image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:39.143009image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:44.520460image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:48.839955image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:54.026576image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:59.215495image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:04.580363image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:09.305193image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:13.510977image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:20.033952image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:26.550933image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:33.116724image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:39.813139image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:44.930366image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:49.369743image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:54.566497image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:59.707104image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:05.043563image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:09.697428image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:14.063711image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:20.710875image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:27.081295image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:33.756510image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:40.471761image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:45.396502image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:49.903841image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:55.081543image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:00.212638image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:05.512698image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:10.145510image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:14.598001image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:21.333450image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:27.627539image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:34.329597image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:40.939032image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:45.768094image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:50.538459image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:55.517515image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:00.663061image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:05.850437image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:10.614249image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:15.188360image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:22.003136image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:28.171612image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:34.986204image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:41.394496image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:46.182189image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:51.035812image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:55.992408image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:01.149532image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:06.229377image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:11.050472image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:15.651921image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:22.561215image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:28.672647image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:35.532460image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:41.899388image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:46.529967image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:51.484648image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:18:56.419079image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:01.584543image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-10-29T15:19:06.637810image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-10-29T15:19:33.768650image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
passenger_counttrip_distanceratecodeidpulocationiddolocationidfare_amountextratip_amounttolls_amounttotal_amountdurationvendoridstore_and_fwd_flagpayment_typemta_taximprovement_surchargecongestion_surchargeairport_fee
passenger_count1.0000.0450.0810.003-0.0060.068-0.0400.0210.0630.0680.0670.2540.0730.0370.0490.0120.0260.022
trip_distance0.0451.0000.285-0.117-0.1020.8830.0900.3750.4430.8700.8440.0400.0000.0070.1140.0110.1930.386
ratecodeid0.0810.2851.000-0.075-0.0410.412-0.1830.1460.5470.4030.2630.0940.0000.0320.0030.0210.1600.017
pulocationid0.003-0.117-0.0751.0000.084-0.122-0.029-0.027-0.083-0.113-0.1120.0250.0050.0380.0330.0090.2050.386
dolocationid-0.006-0.102-0.0410.0841.000-0.111-0.004-0.013-0.051-0.100-0.1150.0160.0000.0490.1400.0130.1530.057
fare_amount0.0680.8830.412-0.122-0.1111.0000.0460.3910.4600.9760.8870.0550.0000.1590.3340.2500.3260.469
extra-0.0400.090-0.183-0.029-0.0040.0461.0000.0930.0530.0890.0670.8720.0970.0920.1300.0930.1700.330
tip_amount0.0210.3750.146-0.027-0.0130.3910.0931.0000.2370.5270.3460.0150.0230.0260.1460.0000.0850.049
tolls_amount0.0630.4430.547-0.083-0.0510.4600.0530.2371.0000.4760.3820.0330.0000.0480.3350.0680.1680.339
total_amount0.0680.8700.403-0.113-0.1000.9760.0890.5270.4761.0000.8640.0620.0000.2050.3800.3070.3890.497
duration0.0670.8440.263-0.112-0.1150.8870.0670.3460.3820.8641.0000.0230.0000.0280.0130.0190.0160.047
vendorid0.2540.0400.0940.0250.0160.0550.8720.0150.0330.0620.0231.0000.1090.0650.0650.0620.0520.047
store_and_fwd_flag0.0730.0000.0000.0050.0000.0000.0970.0230.0000.0000.0000.1091.0000.0230.0010.0330.0030.000
payment_type0.0370.0070.0320.0380.0490.1590.0920.0260.0480.2050.0280.0650.0231.0000.3110.3200.3590.130
mta_tax0.0490.1140.0030.0330.1400.3340.1300.1460.3350.3800.0130.0650.0010.3111.0000.5870.6660.247
improvement_surcharge0.0120.0110.0210.0090.0130.2500.0930.0000.0680.3070.0190.0620.0330.3200.5871.0000.6260.250
congestion_surcharge0.0260.1930.1600.2050.1530.3260.1700.0850.1680.3890.0160.0520.0030.3590.6660.6261.0000.316
airport_fee0.0220.3860.0170.3860.0570.4690.3300.0490.3390.4970.0470.0470.0000.1300.2470.2500.3161.000

Missing values

2023-10-29T15:19:11.750860image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-29T15:19:13.166218image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

vendoridpassenger_counttrip_distanceratecodeidstore_and_fwd_flagpulocationiddolocationidpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_feeduration
021.00.971.0N16114129.31.000.50.000.01.014.302.50.008.433333
121.01.101.0N4323717.91.000.54.000.01.016.902.50.006.316667
221.02.511.0N48238114.91.000.515.000.01.034.902.50.0012.750000
310.01.901.0N1387112.17.250.50.000.01.020.850.01.259.616667
421.01.431.0N10779111.41.000.53.280.01.019.682.50.0010.833333
521.01.841.0N161137112.81.000.510.000.01.027.802.50.0012.300000
621.01.661.0N239143112.11.000.53.420.01.020.522.50.0010.450000
721.011.701.0N142200145.71.000.510.743.01.064.442.50.0022.733333
821.02.951.0N164236117.71.000.55.680.01.028.382.50.0014.933333
921.03.011.0N141107214.91.000.50.000.01.019.902.50.0010.900000
vendoridpassenger_counttrip_distanceratecodeidstore_and_fwd_flagpulocationiddolocationidpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_feeduration
5999021.01.551.0N68230227.50.00.50.000.01.031.502.50.036.050000
5999112.01.501.0N24979110.72.50.51.470.01.016.172.50.011.066667
5999211.00.501.0N6826425.12.50.50.000.01.09.102.50.02.500000
5999313.01.001.0N10726417.22.50.52.200.01.013.402.50.04.950000
5999412.01.901.0N23448219.12.50.50.000.01.023.102.50.027.666667
5999512.06.301.0N48255130.33.50.57.050.01.042.352.50.029.966667
5999611.01.001.0N23714217.92.50.52.000.01.013.902.50.06.666667
5999711.02.001.0N142233112.82.50.54.200.01.021.002.50.014.066667
5999811.00.901.0N26326217.22.50.52.250.01.013.452.50.05.983333
5999911.01.801.0N140142111.42.50.53.100.01.018.502.50.011.566667

Duplicate rows

Most frequently occurring

vendoridpassenger_counttrip_distanceratecodeidstore_and_fwd_flagpulocationiddolocationidpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_feeduration# duplicates
011.00.01.0N4826423.02.50.50.00.01.07.002.50.000.0000004
111.00.01.0N23926423.02.50.50.00.01.07.002.50.000.0000002
221.00.01.0N13213223.00.00.50.00.01.05.750.01.250.0666672
321.00.05.0N265265150.00.00.010.20.01.061.200.00.000.1000002
421.00.71.0N797916.51.00.52.30.01.013.802.50.004.6666672
522.00.05.0N2652651120.00.00.024.20.01.0145.200.00.000.1333332
623.00.01.0N13213223.00.00.50.00.01.05.750.01.250.1333332